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Hightlights 


A prediction method of coronaviral 3CLpro cleavage sites was proposed to 


balance the accuracy and false positives. 


3 of the 9 putative non-canonical cleavage sites were verified, which are located 


upstream to nsp4. 


All 11 canonical cleavage sites of MERS-CoV 3CLpro were confirmed and the 


Michaelis constants were calculated. 
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Abstract 


Coronavirus 3C-like protease (3CLpro) is responsible for the cleavage of coronaviral 
polyprotein la/lab (ppla/lab) to produce the mature non-structural proteins (nsps) of 
nsp4-16. The nsp5 of the newly emerging Middle East respiratory syndrome 
coronavirus (MERS-CoV) was identified as 3CLpro and its canonical cleavage sites 
(between nsps) were predicted based on sequence alignment, but the cleavability of 
these cleavage sites remains to be experimentally confirmed and _ putative 
non-canonical cleavage sites (inside one nsp) within the ppla/lab awaits further 
analysis. Here, we proposed a method for predicting coronaviral 3CLpro cleavage 
sites which balances the prediction accuracy and false positive outcomes. By applying 
this method to MERS-CoV, the 11 canonical cleavage sites were readily identified and 
verified by the biochemical assays. The Michaelis constant of the canonical cleavage 
sites of MERS-CoV showed that the substrate specificity of MERS-CoV 3CLpro is 
relatively conserved. Interestingly, 9 putative non-canonical cleavage sites were 
predicted and three of them could be cleaved by MERS-CoV nsp5. These results pave 
the way for identification and functional characterization of new nsp products of 


coronaviruses. 


Keywords: MERS-CoV; 3C-like protease; Canonical cleavage sites; Non-canonical 


cleavage sites; Michaelis constants. 
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Introduction 

Middle East respiratory syndrome coronavirus (MERS-CoV) is an enveloped virus 
carrying a genome of positive-sense RNA (+ssRNA). It was identified as the pathogen 
of a new viral respiratory disease outbreak in Saudi Arabia in June 2012, named as 
Middle East Respiratory Syndrome (MERS). MERS-CoV emerged ten years after 
severe acute respiratory syndrome coronavirus (SARS-CoV) (Zaki et al., 2012) and 
quickly spread to several countries in Middle East and Europe (Assiri et al., 2013; 
Tashani et al., 2014). Soon after the first report, the MERS-CoV genome was 
sequenced and its genomic organization has been elucidated (van Boheemen et al., 
2012). This new coronavirus is classified in the lineage C of beta coronavirus, and is 
close to bat coronavirus HKU4 and HKUS (de Groot et al., 2013; Lau et al., 2013). 
Like other coronaviruses (Hussain et al., 2005; Zuniga et al., 2004), MERS-CoV 
contains a 3' coterminal, nested set of seven subgenomic RNAs (sgRNAs), enabling 
translation of at least 9 open reading frames (ORFs). The 5'-terminal two thirds of 
MERS-CoV genome contains a large open reading frame ORFlab, which encodes 


polyprotein la (ppla, 4391 amino acids) and polyprotein lab (pplab, 7078 amino 


acids), the latter being translated via a -1 ribosomal frameshifting at the end of ORF 1a. 


These two polyproteins were predicted to be subsequently processed into 16 
non-structural proteins (nsps) by nsp3, a papain-like protease (PLpro), and nsp5, a 


3C-like protease (3CLpro) (Kilianski et al., 2013; van Boheemen et al., 2012). 


Protease plays a key role during virus life cycle. It is essential for viral replication by 
mediating the maturation of viral replicases and thus becomes the target of potential 
antiviral drugs (Thiel et al., 2003; Ziebuhr et al., 2000). Investigating the cleavage 
sites of coronavirus proteases and the processing of polyproteins pp1la/lab will benefit 
to identify the viral proteins and their potential function for viral replication. Some 
cleavage sites have been identified and confirmed by previous studies, including three 
cleavage sites of PLpros of human coronavirus 229E (HCoV 229E), mouse hepatitis 


virus (MHV), SARS-CoV, MERS-CoV and infectious bronchitis virus (IBV), whose 


A 
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cleavages release the first 3 non-structural proteins (Bonilla et al., 1995; Kilianski et 
al., 2013; Lim and Liu, 1998; Ziebuhr et al., 2007). The canonical cleavage sites of 
3CLpros, the sites between the recognized nsps, have also been characterized, 
including all sites of MHV, IBV, SARS-CoV and a fraction of sites of HCoV 229E 
which release the non-structural proteins from nsp4 to nsp16 (Deming et al., 2007; 
Grotzinger et al., 1996; Liu et al., 1994; Liu et al., 1997; Lu et al., 1995). For 3CLpro 
of MERS-CoV, two cleavage sites releasing nsp4 to nsp6 have been identified 


(Kilianski et al., 2013). However, other cleavage sites remain to be characterized. 


Furthermore, efforts have been taken to predict these cleavages sites by sequence 
comparison. Gorbalenya ef. al. made the first systematical prediction on IBV 
ppla/lab according to the substrate specificity of 3C protease of picornaviruses 
(Gorbalenya et al., 1989). However, two of their predicted cleavage sites within nsp6 
of IBV were proved uncleavable (Liu et al., 1997; Ng and Liu, 2000). Gao et. al. 
developed a software (ZCURVE_ CoV) to predict the nsps as well as gene-encoded 
ORFs of coronaviruses more accurately based on previous studies of 3CLpros 
cleavage sites of IBV, MHV and HCoV 229E (Gao et al., 2003). Later on, 
non-orthogonal decision trees were used to mine the coronavirus protease cleavage 
data and to improve the sensitivity and accuracy of prediction (Yang, 2005). However, 
while these methods focus on the prediction of the canonical cleavage sites and target 
more and more on prediction accuracy to avoid false positives, potential 
non-canonical cleavage sites might be neglected. For example, a cleavage site 
between nsp7-8 of MHV strain AS59 is not predicted by above methods, but proved to 
be physiologically important since it produces a shorter nsp7 that can support the 
growth of MHV carrying a mutation on nsp7-8 cleavage site (Deming et al., 2007). 
Therefore, the substrate specificities of coronaviruses 3CLpros are complicated. A 
3CLpro substrate library of four coronaviruses (HCoV-NL63, HCoV-OC43, 
SARS-CoV and IBV) containing 19 amino acids x 8 positions variants was 


constructed by making single amino acid (aa) substitution at each position from P5 to 
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P3', and their cleavage efficiencies were measured and analyzed to find out the most 
preferred residues at each position (Chuck et al., 2011). However, the non-canonical 
cleavage site with less preferred residues of 3CLpro is adopted by coronaviruses 
(Deming et al., 2007). Thus we speculate that other potential 3CLpro cleavage sites 


may still exist in coronaviruses. 


In order to set up a more moderate and balanced criteria for protease cleavage site 
identification, we compared 6 scanning conditions with different stringency to 
systematically predict the 3CLpro cleavage sites on ppla/lab of 5 coronaviruses 
including MERS-CoV. As a representative, the cleavability of the predicted cleavage 
sites of MERS-CoV 3CLpro was analyzed by the recombinant luciferase cleavage 
assay and the fluorescence resonance energy transfer (FRET) assay. The results 
showed that all 11 canonical cleavage sites of MERS-CoV pp1a/lab were cleavable in 
our experiments and 3 of 9 predicted non-canonical cleavage sites appeared to be 
cleavable. Our study points out a new direction regarding the prediction and 
identification of cleavage sites of proteases and contributes to understanding the 


mechanism of coronaviral polyprotein processing. 


Materials and Methods 

Information collection of coronavirus 3CLpro cleavage sites. The genome 
sequences of 28 coronaviruses were downloaded from Genebank database and the 
sequences of the 3CLpro cleavage sites were collected from P4 to P2' (Table S1 to 
Table S4). The substrate profiles of each coronavirus group and the whole 


Coronavirinae were summarized (Table S5). 


Construction of recombinant 3CLpro expression vectors. The coding sequence of 
MERS-CoV nsp5 (NC_ 019843) was synthesized chemically by GenScript and cloned 


into vectors pET28a and pGEX-6p-1, respectively. The catalytic residue mutation 
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C148A was generated by over lapping PCR with mutagenic primers (Table S6). All 


the clones and mutations were confirmed by DNA sequencing. 


Expression and purification of recombinant proteins. The expression vectors were 
transformed into Escherichia coli strain BL21 (DE3). The cells were grown at 37°C in 
Lysogeny broth (LB) medium with antibiotics and induced with 0.2 mM 
isopropylb-D-thiogalactopyranoside (IPTG) at 16°C for 12 hours. The cells were 
harvested and resuspended in lysis buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1 
mM EDTA, 0.05% NP40, 0.1 mg/ml lysozyme and 1mM PMSF) at 4°C. After 
incubation for 30 min on ice, 10 mM MgCl, and 10 pg/ml DNase I (Sigma) were 
added to digest the genomic DNA. The supernatant of cell lysate was applied to 
affinity chromatography column after centrifugation. The recombinant protein with 
His-tag was bound with nickel-nitrilotriacetic acid (Ni-NTA) resin (GenScript) and 
washed with buffer A (50 mM Tris-HCl, pH 7.5, 150 mM NaCl), buffer B (50 mM 
Tris-HCl, pH 7.5, 150 mM NaCl, 20 mM imidazole) and buffer C (50 mM Tris, PH 
7.5, 150 mM NaCl, 50 mM imidazole). Proteins were eluted with buffer D (50 mM 
Tris, PH 7.5, 150 mM NaCl, 250 mM imidazole). GST-tagged protein was bound with 
GST resin (GenScript), washed with buffer A and eluted with buffer A supplemented 
with 10 mM reduced glutathione (GSH). The purified proteins were desalted and 
concentrated by ultrafiltration using 30 kD amicon ultra 0.5-ml centrifugal filter 


(Millipore). 


Luciferase-based biosensor assay. All the cleavage sites (8 residues, ranging from 
P5 to P3') were inserted into Glo-Sensor 10F linear vector. Comparing to the wild 
type firefly luciferase (550 aa), Glo-Sensor luciferase has short truncations at both 
termini with C- and N-part reversed, resulting in the new 234-aa N- and 233-aa 
C-terminal region respectively. The inserted sequence and the reversed arrangement 
of the N- and C-terminal regions reduce the luciferase activity dramatically. After the 


recognition sequence was cut off by nsp5, the luciferase recover its activity and 
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luminescence in the presence of luciferase substrate. A back to front recombinant 
firefly luciferase inserted with different cleavage sites was expressed when the 
recombinant plasmids were co-incubated with a cell-free protein expression system 
extracted from wheat germ (Promega). After incubation for 2 hours at 25°C, nsp5 was 
added into the system and the whole system was incubated at 30°C for 1 hour. Then, 
the reaction system was diluted 20 times and mixed thoroughly with equal volume of 
luciferase substrate. Luciferase luminescence was measured by a luminometer 


(Promega) after incubation for 5 min at room temperature. 


Peptide-based FRET assay. All the 11 conserved putative recognition sites were 
designed from P12 to P8', synthesized and modified with a typical shorter wavelength 
FRET pair, N-terminal DABCYL and C-terminal Glu-EDANS by GL Biochem 
(Shanghai). The peptides were completely dissolved in DMSO and the final 
concentration of DMSO in the reaction system was 1%. 180 uM substrate peptide and 
16.3 uM tagged nsp5 were mixed in the solution of 50 mM Tris, ph 7.5, | mM EDTA, 
50 uM DTT and incubated at 37 °C for 2 hours. To calculate kcat/Km, different 
amounts (7.2 uM - 180 uM) of substrate peptides were co-incubated with 16.3 uM 
nsp5. The reaction system was placed in Giernor black plate and the fluorescence was 
detected by a microplate reader (Molecular Devices) with Ex/Em (nm/nm) =340/490. 


Relative Fluorescence Unit (RFU) was collected every 30 sec for 2 hours. 


Calculation of Michaelis constants. The initial slope (slope A = RFU/min) was 
generated from the linear interval of the rising stage. Then, a linear equation was 
generated using the RFU at plateau (RFUmax) vs. the concentration of substrate. The 
slope (Slope B = RFU/[S]) indicates the RFU change at per unit change of [S]. The 
initial reaction velocity (Vo = [S]/min) was calculated through dividing slope A by 
slope B. The Michaelis-Menten kinetic constants were generated by Lineweaver-Burk 


plot. 
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Results 

The coronavirus 3CLpros and their cleavage sites are evolutionarily conserved 
among different genera. 

To study the genetic diversity and evolution of 3CLpro cleavage sites of 
coronaviruses ppla/lab, 308 primary sequences of 3CLpro cleavage sites (ranging 
from P4 to P2') of 28 species of coronaviruses were collected and listed in Tables 
S1-S4, including the predicted and verified cleavage sites. 11 canonical cleavage sites 
of each coronavirus were joined end to end to produce a spliced sequence which was 
then used to produce a phylogenetic tree (Fig. 1A). In addition, the sequences of all 
coronavirus 3CLpro were used to generate another phylogenetic tree (Fig. 1B). The 
analyses showed that the phylogenetic distances and taxonomic positions of each 
virus, in both phylogenetic trees, were mostly consistent with that classified by the 
International Committee on Taxonomy of Viruses (ICTV) 
(http://www.ictvonline.org/virusTaxonomy.asp). These results implied that the 
cleavage sites of coronaviral 3CLpros might co-evolve with 3CLpros, and the genetic 
diversity of both 3CLpro and its cleavage sites are relatively conserved between 
different genera of coronaviruses. However, on the phylogenetic tree generated with 
3CLpro cleavage sites (Fig. 1A), the members of the genus Gammacoronavirus, 
although clustered closely, is split into alphacoronaviruses and deltacoronaviruses, 
suggesting that the cleavage sites of gammacoronaviruses may have undergone 


recombination events during evolution. 


Setup of the predicting conditions of coronaviruses 3CLpro cleavage sites. 

In order to develop an optimized method for cleavage site prediction that can cover all 
possible cleavage sites with fewer false positives, we have set three levels of criteria 
(stringent, moderate and mild) for cleavage site prediction. In the stringent rules, 
3CLpro cleavage sites only comprise the most preferred residues at each position 
based on previous description (Chuck et al., 2011). In moderate rules, 3CLpro 


cleavage sites comprise residues which ever appeared in the cleavage sequences of 
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congeneric coronaviruses at each particular position. As for mild rules, the cleavage 
sites could comprise any residues ever found in the cleavage sequences of all 
coronaviruses at each particular position. Because the substrate preference at P4 and 
P2' is not strong, we decided to adopt two different lengths of cleavage sequences for 
prediction, one containing 6 residues from position P4 to P2', and the other containing 
4 residues from position P3 to Pl'. These two lengths of cleavage sequences, 
combining with the three different criteria, made up a total of six search conditions for 
cleavage site predication with decreasing degree of stringency. The canonical 
cleavage sites of 3CLpro for these 7 groups of coronaviruses were summarized in 
table S1-S4 and used to set conditions III to VI. Possible residues at each particular 
position of 3CLpro cleavage sites were predicted based on all six conditions to make 
the cleavage site profile of coronaviruses 3CLpro (table S5). In principle, when 
condition I was employed, the least number of possible cleavage sites were identified 
in a scanned sequence, while condition VI predicted the largest number of possible 


cleavage sites in a scanned sequence. 


To the applicability, we applied all the six conditions on 5 representative 
coronaviruses, including HCoV 229E from alphacoronavirus, MHV _ from 
betacoronavirus lineage A, SARS-CoV from beta coronavirus lineage B, MERS-CoV 
from betacoronavirus lineage C and IBV from gammacoronavirus. All possible 
cleavage sites predicted based on each condition were scanned on ppla/lab of five 
representative coronaviruses and the results were summarized in Table 1. As shown in 
Table 1, increasing numbers of cleavages sites were found for each coronavirus when 
conditions from I to VI were applied. The results showed that condition I and II were 
too strict to cover all 11 canonical cleavages sites; condition V and VI were too loose 
so as to produce 2-3 times more than 11 cleavages sites; condition III could only 
cover the canonical cleavage sites for SARS CoV; only condition IV generates an 
appropriate number of cleavage sites for all 5 coronavirus. Therefore, search condition 


IV was chosen for further analysis of the cleavage sites of MERS-CoV. 
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By applying the search condition IV, 9 putative cleavage sites (PSs) as well as 11 
canonical cleavage sites (CSs) were predicted (Table 2). Although the canonical 
cleavage sites of MERS-CoV 3CLpro have been predicted by sequence alignment 
with other coronavirus (van Boheemen et al., 2012), our results suggested that the 


additional cleavage might occur in the process of MERS-CoV pp1la/1lab processing. 


Activity of MERS-CoV 3CLpro in biochemical assays . 

To verify the activity of MERS-CoV 3CLpro and cleavability of the predicted 
cleavage sites, the biochemical assay systems of MERS-CoV 3CLpro were 
established. As shown in Fig. 2A and 2B, we first expressed and purified MERS-CoV 
3CLpro (nsp5) with different tags and mutation: N-terminally GST-tagged nsp5 
(Gnsp5, 60.4 kDa), N-terminally His-tagged (34 extra amino acids with 6<His tag and 
linker provided by vector pET-28a) nsp5 (Hnsp5, 36.9 kDa), Hnsp5 with catalytic 
residue mutation C148A (Hnsp5m, 36.9 kDa) (Kilianski et al., 2013) and GST 
tag-GVLQ-nsp5 with C148A mutation and 6xHis tag (Gnsp5mH, 61.6 kDa), in which 
the sequence motif GVLQ represents the last four residues of MERS-CoV nsp4, 
mimicking the cleavage site of MERS-CoV nsp4/nspS. In the biochemical assays, the 
Gnsp5mH with catalytic residue mutation C148A could not undergo self-cleavage at 
the cleavage site to release GST in incubation for 16 hours (Fig. 2C), indicating that 
the 3CLpro activity of MERS-CoV nspS5 in Gnsp5mH was inactivated by the mutation 
C148A. Thus, Gnsp5mH was used as protease substrate in the following biochemical 
assays. To verify the 3CLpro activity of recombinant nsp5s, Gnsp5 and Hnsp5 were 
incubated with substrate Gnsp5mH for 5 minutes to 16 hours and analyzed by 


SDS-PAGE (Fig. 2D) and Western blotting, respectively (Fig. 2E). Both Gnsp5 and 


Hnsp5 showed the proteolysis activity to cleave the substrate GnspS5mH into two parts: 


GST (26.0 kDa) and nspS5mH (34.1 kDa), which were confirmed by the correlation of 
their molecular weight (Figs. 2D and 2E). However, the 3CLpro activity of Gnsp5 


was obviously weaker than that of Hnsp5, which could entirely cleave the substrate 
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Gnsp5mH 2 hours post treatment (Figs. 2D and 2E). These results could be explained 
by that the larger fusion tag at the N terminus of MERS-CoV 3CLpro significantly 
reduced the proteolysis activity of 3CLpro, which was consistent with the previous 
observation (Xue et al., 2007). In the biochemical assays, the relatively lower 
proteolysis activity of 3CLpro will benefit to observe the influence of different 
substrates. Therefore, both recombinant Gnsp5 and Hnsp5 were used as MERS-CoV 


3CLpro in the following studies. 


Identification of the cleavability of predicted cleavage sites in MERS-CoV 
ppla/lab. 

To rapidly evaluate the proteolysis activity of MERS-CoV 3CLpro towards the 
predicted cleavage sites of different substrates, a sensitive luciferase-based biosensor 
assay was adopted. As shown in Fig. 3A, the canonical cleavage sites (CS) of 
MERS-CoV nsp4/nsp5 (CS4/5) and nsp5/nsp6 (CS5/6), which were experimentally 
confirmed in a previous study (Kilianski et al., 2013), were inserted into the inverted 
and circularly permuted luciferase construct pGlo-10F, in which the N-terminal and 
C-terminal halves of luciferase gene are separated. The resulting luciferase in 
translation system in vitro was inactive and could convert into an active luciferase 
when cleaved by recombinant viral protease at the engineered cleavage sites (such as 
CS4/5 and CS5/6). In this system, the luciferase signals were detected when incubated 
with both Gnsp5 and Hnsp5, respectively (Fig. 3B). In contrast, the mutated nsp5 
(Hnsp5m) could not convert the inactive luciferase into active form (Fig. 3B). This 
result indicated that the luciferase-based biosensor assay could be used to evaluate the 
proteolysis activity of MERS-CoV 3CLpro. Then, the other 9 canonical cleavage sites 
and 9 putative cleavage sites composed with 8 aa from MERS-CoV ppla/lab were 
inserted into the luciferase construct pGlo-10F, and the luciferase-based biosensor 
assays were performed using Hnsp5 and Hnsp5m, respectively. As shown in Fig. 3C, 
all the 11 canonical cleavage sites of MERS-CoV 3CLpro generated luciferase signal 


by Hnsp5 at least 6.6 times higher than by the inactive Hnsp5m, indicating that all 
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these canonical sites could be cleaved by MERS-CoV 3CLpro. These results 
experimentally verified the existence of the 11 predicted canonical cleavage sites. 
Interestingly, among the 9 putative cleavage sites, the luciferase signals of PS1-1, 
PS3-1 and PS3-3 remarkably increased more than 70 folds when incubated with 
Hnsp5, indicating that the putative cleavage sites, located inside nsp! and nsp3 of 
MERS-CoV respectively, might be cleavable (Fig. 3D). The other 6 predicted putative 
sites (PS3-2, PS5-1, PS6-1, PS12-1, PS13-1, and PS16-1) showed less than 2.5 folds 
increase of luciferase signal when they were treated by Hnsp5 comparing with those 
treated by Hnsp5m (Fig. 3C and 3D). Due to high sensitivity of the luciferase-based 
biosensor assay and the fact that the confirmed canonical cleavage sites generated at 
least 6.6 times increase of luciferase signal, the cleavage signal of these six sites may 
represent the background level, indicating that they are likely uncleavable per se. 
These results suggest that previously unrecognized 3CLpro cleavage sites may exist 


inside the nsps, which were regarded as non-canonical cleavage sites. 


Analysis of the substrate specificity of MERS-CoV 3CLpro. 

The substrate specificity of coronaviruses 3CLpro is determined by the residues from 
P4 to P2' positions of cleavage sites, especially depending on the Pl, P2 and P1' 
positions, which would benefit the prediction of cleavage site and design the 
broad-spectrum inhibitors of coronaviruses 3CLpro (Chuck et al., 2011; Hegyi and 
Ziebuhr, 2002). Previous studies demonstrated that different canonical cleavage sites 
of some representative coronaviruses are not equally susceptible to proteolysis by 
recombinant 3CLpro (Fan et al., 2004; Hegyi and Ziebuhr, 2002). To define the 
susceptibility of the canonical cleavage sites and substrate specificity of MERS-CoV 
3CLpro, 20-mer synthetic peptides representing corresponding canonical cleavage 
sites of MERS-CoV 3CLpro were synthesized and modified with N-terminal 
DABCYL and C-terminal Glu-EDANS (Fig. 4A). The fluorophore EDANS and 
quencher DABCYL are widely used in the biochemical assays based on the 


fluorescence resonance energy transfer (FRET). As shown in Fig. 4B, the peptides 
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represented cleavage sites CS4/5 and CS5/6 were tested to optimize the FRET assay, 
and the relative fluorescence unit (RFU) folds of both sites significantly increased 
when incubated with GnspS and Hnsp5. Although the FRET assay system is more 
costly and less sensitive than the luciferase-based biosensor assay (Figs. 3B and 4B), 
it provides continuous read signals during the process of reaction, which could 
measure the kinetic characteristic of protease towards different substrates. The initial 
reaction rate (RFU/min) of all 11 canonical cleavage sites of MERS-CoV were 
measured and shown in Fig. 4C. The Michaelis constants including kcat, Km, 
kcat/Km and relative kcat/Km were then calculated (Table 3). As shown in Table 3, 
the substrate specificity of MERS-CoV 3CLpro is relatively conserved with other 
coronaviruses as previously reported (Fan et al., 2004; Hegyi and Ziebuhr, 2002; 
Ziebuhr and Siddell, 1999). The relative kcat/Km values of CS4/5 and CS5/6 
indicated that the cleavage sites flanking MERS-CoV 3CLpro are converted 
significantly faster than other sites. The efficient proteolysis at the sites flanking nsp5 
implies that the nsp5 (3CLpro) might be released from the polyprotein la/lab at the 
very early stage of the maturation of viral nsps, which is similar with the HCoV, 
TGEV, SARS-CoV and MHV (Fan et al., 2004; Hegyi and Ziebuhr, 2002). However, 
the relative kcat/Km value of CS4/5 is lower than that of CS5/6 (Table 3), which is 
different from that of the coronaviruses (Fan et al., 2004; Hegyi and Ziebuhr, 2002). 
This could be explained by that the residue Gly (G) at the P4 of cleavage site between 
nsp4 and nsp5 of MRES-CoV reduces the protease activity of 3CLpro comparing with 
the residues Ser (S), Ala (A) and Thr (T) of other coronaviruses (Tables S1-S4) as 
previous described (Chuck et al., 2011). Whether such disparity plays any role in the 


replication and pathogenesis of MERS-CoV is unknown. 


Discussion 
The processing of viral polyprotein by 3CLpro is essential for the replication of 
coronaviruses. Besides the 11 canonical cleavage sites of coronaviruses, some 


additional cleavage sites inside nsps, so called non-canonical cleavage sites, have also 
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been identified (Deming et al., 2007). Therefore, more non-canonical 3CLpro 
cleavage sites are to be identified in different coronaviruses. In this study, we 
designed 6 search conditions for predicting 3Clpro cleavage sites, among which, the 
search condition IV provides a feasible way to reveal the potential cleavage sites of 
3CLpro within coronaviruses. Based on the genetic diversity of different coronavirus 
genera (Fig. 1), the scanning condition IV adopted the residues of 3CLpro cleavage 
sites, which ever appeared in the cleavage sequences of congeneric coronaviruses at 
position P3 to P1'. In contrast, conditions I, II, III, V and VI were either too restrictive 
or generated too many false positive outcomes (Table 1). In the suggested condition 
IV, 4 residues from position P3 to Pl’ were applied to the prediction of 3CLpro 
cleavage site. By measuring the relative protease activities of 3CLpro from different 
coronavirus genera against 19 amino acids x 8 positions of substrate variants, it is 
shown that the substrate specificity of position P5, P2' and P3' are significantly lower 
than other positions (Chuck et al., 2011). Therefore, the consideration of 6 or more 
residues is unnecessary, which could lead to leave-out of potential cleavage sites 
(Table 1). Comparing with the previous researches on the prediction and identification 
of 3CLpro cleavage sites, the scanning condition IV showed its advantages. For 
example, the two nonexistent putative cleavage sites predicted within nsp6 of IBV 
(Gorbalenya et al., 1989; Liu et al., 1997; Ng and Liu, 1998) were avoided in our 
prediction method (data not shown). Notably, the noncanonical cleavage site at the 
end of MHV nsp7 identified by Deming et al. could be predicted using scanning 


condition IV. 


By using the search condition IV, 9 putative cleavage sites were predicted in 
MERS-CoV pp lab in addition to the 11 canonical cleavage sites. The luciferase signal 
of CS10/12 increased 6.6 fold when treated with nsp5 in the recombinant luciferase 
cleavage assays, which is the lowest among the 11 canonical cleavage sites (Fig. 3C). 
Therefore, the 6.6 fold increase of luciferase signal was used arbitrarily as a threshold 


for judging positive and negative. Among the 9 predicted putative cleavage sites, three 
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sites (PS1-1, PS3-1 and PS3-3) showed obviously increasing signals at least 70 times 
above the background (Fig. 3D) and therefore were regarded as cleavable sites. The 
increase of signals of other 6 predicted putative cleavage sites was less than 2.5 times 
(Fig. 3D). Therefore, they were regarded as non-cleavable sites and thus as false 
positives from the prediction. Interestingly, the homologous sequence of PS1-1 and 
PS3-1 are conserved in lineage C of betacoronavirus including MERS-CoV, BatCoV 
HKU4 and BatCoV HKUS (Figs. 5A and 5B). However, PS3-3 is MERS-CoV unique 
sequence (Fig. 5C). Moreover, the cleavability of a cleavage site in biochemical 
assays is a necessary but not sufficient condition for its physiological existence in the 
viral infection. A predicted cleavage site may or may not be accessible by a protease. 
The 3D structure model of MERS-CoV ADP-ribose-1-monophosphatase (ADRP) 
domain built by comparative protein modeling and papain like protease (PLpro) 
domain (Bailey-Elkin et al., 2014) showed that both PS3-1 and PS3-3 are located at 
the surface of ADRP and PLpro domain, opposite to the enzymatic active centers 
(Figs. SD and 5E), suggesting that these two sites are like approachable by the 
proteinase. Most recently, the crystal structure of MERS-CoV 3CLpro was 
determined (Needle et al., 2015). Although PS5-1 is also located at the surface of 
MERS-CoV 3CLpro, the self-cleavage of MERS-CoV nsp5 was not observed in this 
study (Fig. 2). Therefore, the threshold we proposed in the luciferase-based biosensor 
system to exclude the false positive prediction results is reasonable (Fig. 3D). 
However, further studies are needed to identify the predicted cleavage products from 
the cells infected by MERS-CoV. Currently, such work with live MERS-CoV is 
limited in our research facilities due to the biosafety rules, but it can be addressed in 


collaboration in the future. 


Notably, the outcomes of the two cleavage assay systems were different. The signal 
fold change of highly sensitive luciferase-based biosensor assay is dependent on the 
accumulation of active luciferase cleaved by nsp5 during 1 hour (Materials and 


Methods section), while the outcome of the FRET assay is instant relative 
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fluorescence unit (RFU) signal. The RFU/min is the initial speed of the reaction, 
which reflects but not equals to the efficiency of the cleavage. These differences may 
be caused by the steric hindrance of the luciferase subunits, the distance between 
fluorophore and quencher of substrates for FRET assay and substrate solubility. 
Therefore, the activity observed in the two different systems cannot be compared 
directly. Based on the characteristic of the two cleavage assay systems, the highly 
sensitive luciferase-based biosensor assay might be more suitable to high throughput 
screen the predicted putative cleavage site of protease while the FRET assay better for 


cleavage kinetic analysis. 


According to the Michaelis constants of MERS-CoV, the substrate specificity of 
MERS-CoV 3CLpro is relatively conserved with other coronaviruses (Fan et al., 2004; 
Hegyi and Ziebuhr, 2002). Notably, the Pro (P) has been selected as result of 
evolution at position P2 of cleavage site between nsp10 and nsp12 (CS10/12) of 
lineage C betacoronavirus, which is not preferred by the 3CLpro based on the 
previous study (Chuck et al., 2011). However, the relative kcat/Km value of 
MERS-CoV CS10/12 is 0.053, which is 26.5 fold higher than that of SARS-CoV (Fan 
et al., 2004). This indicated that the substrate preferences of some cleavage sites could 
still be varied among different genera of coronaviruses and the proposed scanning 
condition IV regarding the residues ever appearing in the cleavage sequences of 


congeneric coronaviruses is reasonable. 


In summary, we proposed an optimized search condition for predicting cleavage sites 
of coronavirus 3CLpro. We verified the 11 canonical cleavage sites of pplab in 
biochemical assays. We further identified 3 non-canonical cleavage sites in the nsps of 
MERS-CoV. The results provide clues for possible identification of novel cleavage 
products of coronavirus nsps and will benefit the studies of the mechanisms of 


coronavirus replication. 
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Conclusions 

Processing of polyprotein 1a/lab by 3CLpro is essential in coronavirus life cycle. The 
3CLpro cleavage site prediction methods established by previous studies are focus on 
the accuracy, while some noncanonical cleavage sites were missed. In this study, we 
built a moderate prediction method to balance the accuracy and false positive 
outcomes. Using this method, 9 putative cleavage sites, in addition to the 11 canonical 
sites, were predicted in MERS-CoV pplab and the cleavability of 3 of them was 
experimentally confirmed. Interestingly, all these 3 non-canonical cleavage sites are 
located upstream to nsp4, which is in contrast with previous understanding that the 
coronavirus 3CL protease only cleaves from nsp4 to nsp16. This suggests a novel role 
of 3CLpro in coronavirus ppla/lab processing. However, the cleavability of these 
putative cleavage sites needs to be further verified in the viral proteins of 
MERS-CoV-infected cells. Finally, the catalytic constants of the 11 canonical 
cleavage sites of MERS-CoV 3CLpro showed its conservation with the cousins in 


Coronaviridae. 
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Figure legends 

Fig. 1. The phylogenetic tree of 26 representative coronaviruses. (A) The tree was 
generated using an alignment of the joined canonical cleavage sites of 26 
coronaviruses. Sequence alignment was performed by ClustalX 2.0, and the tree was 
built by neighbor-joining method in MEGA 4 (Bootstrap: replication = 1000, random 
seed = 64238). (B) The tree was generated by the sequence of nsp5 and the method is 


the same as described above. 


Fig. 2. Purification of recombinant nsp5 of MERS-CoV and analysis of substrate 
cleavage by protein cleavage assays. (A) Diagram of 4 recombinant proteins. The 
catalytic residue mutation C148A is indicated by a small black triangle. GVLQ 
(P4-P1) are the last four residues of MERS-CoV nsp4. The insertion of these 4 
residues made the N-terminal GST tag cleavable by active nsp5. The cleavage 
position was indicated by a down arrow. (B) SDS-PAGE analysis of the recombinant 
proteins. After all of the proteins were purified, they were concentrated to 1 mg/ml. 2 
ug Gnsp5mH, Hnsp5, Hnsp5m and lug Gnsp5 were loaded to a 10% SDS PAGE gel 
and stained with Coomassie brilliant blue. (C) and (D) Gnsp5mH was incubated in 50 
mM Tris, pH 7.5, 1 mM EDTA, and 50 uM DTT at 37°C alone (C) or with Gnsp5 and 
Hnsp5 (D). The substrate protein was diluted to 0.1 mg/ml. A fraction of the reaction 
mixture was taken out at each time point (0 min, 5 min, | h, 2 h, 4 h, 16 h) and 
analyzed by 10% SDS-PAGE. Products were detected by CBB staining (D) and 
Western blot (E). 


Fig 3. Identification of the cleavability of predicted cleavage sites in recombinant 
luciferase cleavage assays. (A) Schematic diagram of the recombinant luciferase. (B) 
Verification of the recombinant luciferase assays. Inactive luciferase was synthesized 
in the cell-free translation system and the reaction mixture incubated at 25°C for 2 
hours. After that, the protein mixture was divided into four parts and incubated with 


1.63uM Gnsp5, Hnsp5, Hnsp5m or H20, respectively. After incubation for 1 hour at 


22 


Page 22 of 32 


592 
593 
594 
595 
596 
597 
598 
599 


610 
611 
612 
613 
614 
615 
616 
617 
618 
619 
620 


30 °C, the reaction product was diluted 20 times and mixed with equal amount of 
luciferase substrate. After incubation at room temperature for Smin, the luciferase 
luminescence was measured. Luciferase activation fold was calculated through 
dividing the signal value of the reaction system treated with active Hnsp5 by the one 
treated with the inactive nsp5 mutant Hnsp5m. (C) The luciferase cleavage assay of 
predicted 11 canonical cleavage sites and (D) 9 putative cleavage sites. The luciferase 
expression vector inserted with cleavage sites were added to the wheat germ protein 
translation mix and incubated at 25°C for 2 hours, and the reaction mixture was 
divided and treated with Hnsp5 and Hnsp5m, respectively. The dashed line indicates 
the lowest fold increase of luciferase signal by cleavage of previously confirmed 
3CLpro cleavage sites. The data presented here are the mean values+ SD derived from 


three independent experiments. 


Fig. 4. Kinetic analyses of the 11 canonical cleavage sites cleaved by MERS-CoV 
nsp5 by FRET assays. (A) Diagram of the FRET mechanism. EDANS transfer its 
490 nm energy to DABCYL at the excitation of 340 nm, making the emission 
undetected. After the peptide bond between PI and P1' was cut off by nsp5, the 


separation disabled the energy transferring and the 490 nm emission of EDANS can 


be detected. (B) 180 uM synthesized peptide was incubated with 16.3 uM tagged nsp5S. 


After incubation for 2 hours at 37°C, the fluorescence (Ex/Em=340nm/490nm) was 
read by a luminometer. (C) The rate of RFU rise (Slope A = RFU/min) at the linear 
interval right after the reaction began. The data presented here are the mean values+ 


SD derived from three independent experiments. 


Fig. 5. Conservation analysis and the spatial location of the novel noncanonical 
cleavage sites. The sequence alignment of nsp region covering PS1-1 site (A), PS3-1 
site (B) and PS3-3 site (C). The cleavage sites of MERS-CoV were indicated by black 
boxes. (D) Homology modeled structure of ADRP domain of MERS-CoV (template: 


2FAV). ADRP domain was shown as green ribbon. The putative cleavage site was 
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colored in red and the cleavage position Gln was showed by stick. The substrate 


(ADP-ribose) of ADRP domain was shown by stick and colored by atoms (C: cyan, O: 


red, N: blue, P: orange). (E) Structure of papain-like protease of MERS-CoV (4RF1). 
The PLpro domain was shown as cartoon and colored green. The ligand ubiquitin was 
colored cyan. The putative cleavage site was colored red and the cleavage position 


Gln was showed by stick. 
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Tables 
Table 1. The number of cleavage sites in pplab of 5 representative coronaviruses 
predicted by using 6 search conditions 


HCoV 229E MHV SARS-CoV MERS-CoV IBV 

cs* ps? CS PS CS PS CS PS CS PS 
Condition I° 1 0 2 0 2 0 2 0 1 0 
Condition II 1 0 4 1 3 0 4 0 2 0 
Condition HI 11 4 11 5 11 0 11 2 ll 3 
Condition IV 11 10 11 14 11 4 ll 9 ih! “SS 
Condition V 11 9 11 17 11 ll 11 12 11 11 
Condition VI 11 15 11 23 ll 19 11 19 11 13 


“ Canonical cleavage sites, which are located between recognized nsps. 

» Putative cleavage sites, which are located inside various nsps. 

“Six search conditions are designed: Conditions I, II] & V cover 6 residues from P4 to 
P2'; Conditions II, IV & VI cover 4 residues from P3 to P1'. Conditions I and II are set 
to comprise the most preferred residues at each position; Conditions II] and IV 
comprise residues appeared in the cleavage sites of congeneric coronaviruses; 
Conditions V and VI comprise residues appeared in the cleavage sequences of any 
coronaviruses. 


25 


Page 25 of 32 


638 
639 


640 
641 


Table 2. The cleavage site prediction outcomes of MERS-CoV using search 


condition IV 


Canonical cleavage sites 


Site 
CS4/5 
CS5/6 
CS6/7 
CS7/8 
CS8/9 

CS9/10 
CS10/12 
CS12/13 
CS13/14 
CS14/15 
CS15/16 


Position 
3247 
3553 
3845 
3928 
4127 
4237 
4377 
5130 
5908 
6432 
6775 


Sequence * 
GVLQ|SG 
VVMQ|SG 
AAMQ|SK 

SVLQAT 
VKLQ|NN 
VRLQ|AG 
ALPQ|SK 
TTLOJAV 
YKLQ|SQ 
TKVQ|GL 
PRLQ|AS 


Site 
PS1-1 
PS3-1 
PS3-2 
PS3-3 
PS5-1 
PS6-1 

PS12-1 
PS13-1 
PS16-1 


Putative cleavage sites 


Position 
122 
1191 
1278 
1683 
3332 
3580 
5076 
5591 
6793 


Sequence 
TTLQ\GK 
VLLQ\GH 

DIPQ|SL 
VVLQ|GL 
HAMQ|GT 

IILQ\ AT 

NILQ\AT 
VTVQ|GP 
FKVQ\NV 


* The “|” indicates the cleavage position. 
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Table 3. The Michaelis constants of the 11 canonical cleavage sites of MERS-CoV 


3CLpro 

ies kcat/Km kcat/Km PP value* 

kcat (min’ ) Km (uM) betas] 
(mM min’) (rel) 

CS4/5 0.3053+0.05661 75.894£17.57 4.023 1 - 
CS5/6 0.68 11+0.1388 88.25+18.19 7.717 1.9 0.015 
CS6/7 0.2993+0.04865 264.7436.95 1.131 0.28 <0.0001 
CS7/8 2.073+0.5245 321.89+497.63 6.441 1.6 0.011 
CS8/9 0.5161+0.04468 423.1+27.32 1.220 0.30 <0.0001 
CS9/10 2.390+40.2397 833.84182.1 2.866 0.71 0.103 
CS10/12 0.1152+0.02049 534.9+91.71 0.2154 0.053 <0.0001 
CS12/13 —-0.108340.002443 83.90+43.949 1.290 0.32 <0.0001 
CS13/14 0.1815+0.0200 449.7+1.996 0.4036 0.10 <0.0001 
CS14/15 — 0.05115+0.00878 207.2+59.61 0.2469 0.061 <0.0001 
CS15/16 0.3849+0.01126 100.7+6.473 3.823 0.95 0.58 


* P value was statistically analyzed by unpaired Students’s t-test. 
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